Skip to content

fix: Missing parameters such as character length when creating a knowledge base#2827

Merged
shaohuzhang1 merged 1 commit intomainfrom
pr@main@fix_create_dataset
Apr 8, 2025
Merged

fix: Missing parameters such as character length when creating a knowledge base#2827
shaohuzhang1 merged 1 commit intomainfrom
pr@main@fix_create_dataset

Conversation

@shaohuzhang1
Copy link
Copy Markdown
Contributor

fix: Missing parameters such as character length when creating a knowledge base

@f2c-ci-robot
Copy link
Copy Markdown

f2c-ci-robot bot commented Apr 8, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@f2c-ci-robot
Copy link
Copy Markdown

f2c-ci-robot bot commented Apr 8, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shaohuzhang1 shaohuzhang1 merged commit 3557ea5 into main Apr 8, 2025
4 checks passed
@shaohuzhang1 shaohuzhang1 deleted the pr@main@fix_create_dataset branch April 8, 2025 09:24
0)}, dataset_id

@staticmethod
def get_last_url_path(url):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly! Here are some observations and improvements for the provided code:

  1. Code Structure: The function save has multiple responsibilities—saving, handling associations, and constructing the response—but it's split into two separate functions save_with_association and get_response. This can be refactored to merge these functionalities where appropriate.

  2. Bulk Insertion Logic: You're correctly using QuerySet.bulk_create, but consider adding a try-except block around it to handle any potential errors during bulk insertion, such as database constraint violations or other exceptions.

  3. Response Data:

    • Add checks to ensure that document_model_list is not empty before accessing its attributes like char_length.
    • Use type hints (Dict) explicitly where possible to clarify the expected types of inputs and outputs.
  4. User ID Handling: Ensure that user_id is properly set when returning the response.

Here’s an improved version of the code based on the above points:

from functools import reduce

def save(self, instance: Dict, with_valid=True):
    # Assume DataSetSerializers and ProblemParagraphMapping exist elsewhere in the code
    dataset_serializers = DataSetSerializers(instance)
    dataset_responses = {
        'id': instance.get('id')
    }

    problems = None
    if with_valid:
        problems = self._create_problems_with_relation(instance)

    user_id = None  # Define this variable somewhere in your class or pass it via arguments
    document_mappings = self._generate_document_mappings(instance, with_valid)

    if problems:
        dataset_responses.update({
            'problems': problems,
            'document_relations': document_mappings,
            'dataset_list': DocumentSerializers.Query(data={'dataset_id': instance['id']}).list(with_valid=True)}
        )
    
    return dataset_responses

@staticmethod
def _create_problems_with_relation(dataset_data: dict) -> list:
    problems = []
    # Implement logic to create new problems and their mappings here
    return problems

@staticmethod
def _generate_document_mappings(dataset_data: dict, with_valid=True) -> List[dict]:
    document_models = []  # Fetch all document models related to the dataset
    
    if with_valid:
        mapping_list = []
        for doc in document_models:
            mapping_dict = {'doc_id': doc.id, 'dataset_id': doc.dataset_id}
            mapping_list.append(mapping_dict)
        
        QuerySet(ProblemParagraphMapping).bulk_create(mapping_list)
    
    for doc in document_models:
        document_responses = {
            'id': str(doc.id),
            'name': doc.name,
            'description': doc.description,
            'user_id': user_id,  # Assuming user_id is defined somewhere else
            'paragraphs': doc.paragraphs
        }
        document_responses['size'] = sum(len(x) for x in [resp["description"] for resp in (document_responses)] * len([x.split("\n") for x in document_responses]))
        document_requests = ResponseDocumentsSerializer.create_instances(
                data=[docResponses]
                )[0]  
        document_responses.pop("paragraphs")
        return document_requests

@staticmethod
def save_with_association(instance: Dict, with_valid=True):
    problem_paragraph_mapping_list = [
        ProblemParagraphMapping(...) for ... in ...
    ]
    if len(problem_paragraph_mapping_list) > 0:
        BulkCreateManager.objects.bulk_create(problem_paragraph_mapping_list)

@staticmethod
def get_response(data: dict, dataset_id, with_valid=True) -> Dict[str, Any]:
    return {
        **DataSetSerializers(data).data,
        'document_list': DocumentListSerializer.data(data),
        "document_count": len(DocumentListSerializer.data(data)),
        "char_length": sum(reduce(lambda x, y: x + y, [d.char_length for d in Data]), 0)}, dataset_id

These changes aim to improve clarity, maintainability, and robustness of the original code. Adjustments might depend heavily on actual context and classes used within the surrounding environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant